4th October 2020

Challenge - Analyse the mental health of India during COVID using Twitter

dfddfdfs.png

Team Name: zeroth

Team Members Name

  • Dilip Kumar (NIT Jalandhar)
  • Vishal Raj (NIT Kurukshetra)
  • Ekta Sonwani (NIT Jalandhar)
  • Amit Kumar Bind (NIT Jalandhar)

Types of Visualisation Used

  • Bar plots
  • Pie Chart
  • Scatter Plots
  • Word Cloud
  • Curves
  • Word Tree
  • DoNut
  • Map

Project Work

1. Preprocessing and Cleaning of data

In [1]:
import os
import re

import html as ihtml
import pandas as pd
import emoji

import nltk
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

Load Data

In [2]:
df = pd.read_csv('data.csv')
tweet=df['text']
len(df)
Out[2]:
496448

1. Remove NA Records

In [3]:
totallen=len(df)
#droping na records
print("Removing Na records")
df = df.dropna()
df = df.reset_index(drop=True)
print("NA records removed: ",totallen-len(df) )
Removing Na records
NA records removed:  159110

2. Remove Duplicate Records

In [4]:
#droping dulplicats records
print("Removing Duplicates records")
df = df.drop_duplicates()
#reset index after dropping
df = df.reset_index(drop=True)
print("duplicates records removed: ",totallen-len(df) )
Removing Duplicates records
duplicates records removed:  486719

3. Convert emojis into equivalent Text

In [5]:
def preprocess_tweet(row):
    text = row['text']
    text=emoji.demojize(text)
    return text
df['text']=df.apply(preprocess_tweet,axis=1)
df['text']=df['text'].str.replace(":","")
df.head()
Out[5]:
text location date time
0 Curve flattening? Kenya records 48 new virus c... IN 22-Sep 5:08:45
1 Victoria and Melbourne Covid trend map where c... Erbil, Iraq 22-Sep 5:08:34
2 NSW and Sydney Covid trend map where coronavir... Melbourne, Australia 22-Sep 5:08:33
3 IT’S BAKE OFF DAY! raising_hands_medium-light_... Melbourne, Australia 22-Sep 5:06:02
4 @DanielAndrewsMP The Liberal party bots are ou... Fareham 22-Sep 5:05:34

4. Extract Hashtags

In [6]:
df['hashtags']=df.text.str.findall(r'#.*?(?=\s|$)')
df.head()
Out[6]:
text location date time hashtags
0 Curve flattening? Kenya records 48 new virus c... IN 22-Sep 5:08:45 []
1 Victoria and Melbourne Covid trend map where c... Erbil, Iraq 22-Sep 5:08:34 []
2 NSW and Sydney Covid trend map where coronavir... Melbourne, Australia 22-Sep 5:08:33 []
3 IT’S BAKE OFF DAY! raising_hands_medium-light_... Melbourne, Australia 22-Sep 5:06:02 []
4 @DanielAndrewsMP The Liberal party bots are ou... Fareham 22-Sep 5:05:34 []

5. Extract Mentions

In [7]:
df['mention']=df.text.str.findall(r'(?:(?<=\s)|(?<=^))@.*?(?=\s|$)')
df.head()
Out[7]:
text location date time hashtags mention
0 Curve flattening? Kenya records 48 new virus c... IN 22-Sep 5:08:45 [] [@thestarkenya, @MOH_Kenya]
1 Victoria and Melbourne Covid trend map where c... Erbil, Iraq 22-Sep 5:08:34 [] []
2 NSW and Sydney Covid trend map where coronavir... Melbourne, Australia 22-Sep 5:08:33 [] []
3 IT’S BAKE OFF DAY! raising_hands_medium-light_... Melbourne, Australia 22-Sep 5:06:02 [] []
4 @DanielAndrewsMP The Liberal party bots are ou... Fareham 22-Sep 5:05:34 [] [@DanielAndrewsMP]

6. Apply beautify

In [8]:
from bs4 import BeautifulSoup
def beautify(row):
    text = row['text']
    text = BeautifulSoup(ihtml.unescape(text), "lxml").text
    text = re.sub(r"http[s]?://\S+", "", text)
    text = re.sub(r"\s+", " ", text)
    return text
df['text']=df.apply(beautify,axis=1)
df['text']=df['text'].str.replace(":","")
df.head()
Out[8]:
text location date time hashtags mention
0 Curve flattening? Kenya records 48 new virus c... IN 22-Sep 5:08:45 [] [@thestarkenya, @MOH_Kenya]
1 Victoria and Melbourne Covid trend map where c... Erbil, Iraq 22-Sep 5:08:34 [] []
2 NSW and Sydney Covid trend map where coronavir... Melbourne, Australia 22-Sep 5:08:33 [] []
3 IT’S BAKE OFF DAY! raising_hands_medium-light_... Melbourne, Australia 22-Sep 5:06:02 [] []
4 @DanielAndrewsMP The Liberal party bots are ou... Fareham 22-Sep 5:05:34 [] [@DanielAndrewsMP]

7. Remove Mentions

In [9]:
df['text'] = df['text'].replace(re.compile(r'@[A-Z0-9a-z_:]+'),' ')#replace username-tags
df['text'] = df['text'].replace(re.compile(r'^[RT]+'),' ')#replace RT-tags
df['text'] = df['text'].replace(re.compile("[^a-zA-Z]"), " ")#replace hashtags
df.head()
df['text'][0]
Out[9]:
'Curve flattening  Kenya records    new virus cases      recoveries https  t co mnXgUE EnE via   After we ATE CORONA MONEY  someone at the   has been consulting with Darrel Huff          How to lie with statistics   Please just give us a break '
In [10]:
print("Removing links")
#removes links and urls
df['text'] = df['text'].replace(re.compile(r'((www\.[\S]+)|(https?://[\S]+))'),"")
df['text'] = df['text'].replace(re.compile(r'((\w+\/\/\S+))'),"")
print("Links are removed")

df['text']=df['text'].str.replace("_"," ")
df['text'][0]
Removing links
Links are removed
Out[10]:
'Curve flattening  Kenya records    new virus cases      recoveries https  t co mnXgUE EnE via   After we ATE CORONA MONEY  someone at the   has been consulting with Darrel Huff          How to lie with statistics   Please just give us a break '

9. Remove Special Characters

In [11]:
print("Removing punchuation and special characters")
#removes puntuation
df['text'] = df['text'].str.replace('[^\w\s]',' ').str.replace('\s\s+', ' ')
print("Puntuations removed...")
Removing punchuation and special characters
Puntuations removed...

10. Convert to small case

In [12]:
print("converting string small case")
#convert text to small case
df['text'] = df['text'].str.lower()
df['text'][0]
converting string small case
Out[12]:
'curve flattening kenya records new virus cases recoveries https t co mnxgue ene via after we ate corona money someone at the has been consulting with darrel huff how to lie with statistics please just give us a break '

11. Remove single characters

In [13]:
print("Removing single character words")
#remove single character chracter
df['text'] = df['text'].replace(re.compile(r"(^| ).( |$)"), " ")
print("Removing single character words")

df['text'][0]
Removing single character words
Removing single character words
Out[13]:
'curve flattening kenya records new virus cases recoveries https co mnxgue ene via after we ate corona money someone at the has been consulting with darrel huff how to lie with statistics please just give us break '

12. Remove stop words

In [14]:
stop = stopwords.words('english')
#english words
english_word = set(nltk.corpus.words.words())

print("Removing stop words...")
#remove stop words
df['text'] = df['text'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))
print("Stop words removed...")
df['text'][0]
Removing stop words...
Stop words removed...
Out[14]:
'curve flattening kenya records new virus cases recoveries https co mnxgue ene via ate corona money someone consulting darrel huff lie statistics please give us break'

13. Remove non English words

In [15]:
print("Removing non english words")
#remove non english words
df['text'] = df['text'].apply(lambda x: ' '.join([word for word in x.split() if word in (english_word)]))
print("Non english words removed")

df['text'][0]
Removing non english words
Non english words removed
Out[15]:
'curve flattening new virus via ate corona money someone consulting huff lie statistics please give us break'

14. Remove Extra Spaces

In [16]:
df['text']=df['text'].str.strip()
df['text'] = df['text'].replace(re.compile(r"(^| ).( |$)"), " ")
df['text'][0]
Out[16]:
'curve flattening new virus via ate corona money someone consulting huff lie statistics please give us break'

15. Remove tweets less than length of 1

In [17]:
print("Removing tweets having words less than 1 words")
#drops tweets less than 5 words
df.drop(df[df['text'].str.count(" ") < 1].index , inplace=True)
#reset index after dropping
df = df.reset_index(drop=True)
print("tweets having words less than 1 words are removed...")
print("word count less than 1 records removed: ",totallen-len(df) )
Removing tweets having words less than 1 words
tweets having words less than 1 words are removed...
word count less than 1 records removed:  486756

16. Save Clean Data

In [18]:
print("new data started writting in new csv file preprocessed_data.csv...")
#write clean data to new file
df.to_csv('preprocessed_data.csv', index=False, encoding="utf-8")
print("clean data is written on preprocessed_data.csv")

print ("total records",len(df))
new data started writting in new csv file preprocessed_data.csv...
clean data is written on preprocessed_data.csv
total records 9692

Model Creation and Traning the Model

In [ ]:
 

preprocess data

In [19]:
import nltk
import pandas as pd
import numpy as np
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM, SpatialDropout1D

# nltk.download()
from nltk.tokenize import word_tokenize
data = pd.read_csv("preprocessed_data_train.csv")
data['emotion'] = data['emotion'].str.replace('Disgust','Shame')
data['emotion'] = data['emotion'].str.replace('Guilt','Shame')
data['emotion'] = data['emotion'].str.replace('Happiness','Happy')
data['emotion'] = data['emotion'].str.replace('Scared','Fear')
data['emotion'] = data['emotion'].str.replace('sadness','Sad')
data['emotion'] = data['emotion'].str.replace('anger','Angry')
data['emotion'] = data['emotion'].str.replace('Mad','Angry')
data['emotion'] = data['emotion'].str.replace('surprise','Surprise')
#data.drop(data.index[data['sentiment'] == "sentiment"], inplace = True)
#data.drop(data.index[data['emotion'] == "Powerful"], inplace = True)
#data.drop(data.index[data['emotion'] == "Peaceful"], inplace = True) 
data.emotion.value_counts()
Out[19]:
Happy       5440
Shame       3297
Angry       2884
Fear        2817
Sad         2592
Surprise     857
Peaceful     450
Powerful     381
Name: emotion, dtype: int64

1. Tokenization

In [20]:
# tokenization
max_words = 2000
tokenizer = Tokenizer(num_words=max_words, split=' ')
tokenizer.fit_on_texts(data['text'].values)
X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X, maxlen=32)
print(X.shape[1])
32

2. Defining Model

In [21]:
enbedding_out_dim = 256
lstm_out_dim = 256

model = Sequential()
model.add(Embedding(max_words, enbedding_out_dim,input_length = X.shape[1]))
model.add(LSTM(lstm_out_dim+1))
model.add(Dense(8,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(model.summary())
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, 32, 256)           512000    
_________________________________________________________________
lstm (LSTM)                  (None, 257)               528392    
_________________________________________________________________
dense (Dense)                (None, 8)                 2064      
=================================================================
Total params: 1,042,456
Trainable params: 1,042,456
Non-trainable params: 0
_________________________________________________________________
None

3. Splitting data into training and validation data

In [22]:
# data set to train
dummies = pd.get_dummies(data['emotion'])
Y = dummies.values
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, random_state = 50)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)
(14974, 32) (14974, 8)
(3744, 32) (3744, 8)

3. Creating Emotion Dictionary

In [23]:
dict_emotion = {}
dict_label = {}
for i in range(len(Y)):
    dict_emotion[data['emotion'][i]] = np.argmax(Y[i])
    dict_label[np.argmax(Y[i])] = data['emotion'][i]
    if len(dict_emotion) == 8:
        print('Break at: ', i)
        break
print(dict_emotion, dict_label)
Break at:  4440
{'Surprise': 7, 'Sad': 5, 'Happy': 2, 'Angry': 0, 'Shame': 6, 'Fear': 1, 'Powerful': 4, 'Peaceful': 3} {7: 'Surprise', 5: 'Sad', 2: 'Happy', 0: 'Angry', 6: 'Shame', 1: 'Fear', 4: 'Powerful', 3: 'Peaceful'}

4. Define Train or Test set

In [24]:
X_val = X_train[:500]
Y_val = Y_train[:500]
partial_X_train = X_train[500:]
partial_Y_train = Y_train[500:]

5. Train the Model

In [25]:
# train the net
batch_size = 512
history = model.fit(X_train,Y_train, 
                    epochs = 50, 
                    batch_size=batch_size,
                    validation_data=(X_val, Y_val))
Epoch 1/50
30/30 [==============================] - 2s 68ms/step - loss: 1.8581 - accuracy: 0.2904 - val_loss: 1.7178 - val_accuracy: 0.3820
Epoch 2/50
30/30 [==============================] - 2s 58ms/step - loss: 1.6219 - accuracy: 0.4030 - val_loss: 1.3960 - val_accuracy: 0.5440
Epoch 3/50
30/30 [==============================] - 2s 58ms/step - loss: 1.2837 - accuracy: 0.5586 - val_loss: 1.1618 - val_accuracy: 0.6120
Epoch 4/50
30/30 [==============================] - 2s 58ms/step - loss: 1.0957 - accuracy: 0.6280 - val_loss: 1.0040 - val_accuracy: 0.6500
Epoch 5/50
30/30 [==============================] - 2s 57ms/step - loss: 1.0125 - accuracy: 0.6507 - val_loss: 0.9449 - val_accuracy: 0.6700
Epoch 6/50
30/30 [==============================] - 2s 60ms/step - loss: 0.9570 - accuracy: 0.6689 - val_loss: 0.8907 - val_accuracy: 0.6880
Epoch 7/50
30/30 [==============================] - 2s 59ms/step - loss: 0.9232 - accuracy: 0.6807 - val_loss: 0.8710 - val_accuracy: 0.7040
Epoch 8/50
30/30 [==============================] - 2s 58ms/step - loss: 0.8909 - accuracy: 0.6894 - val_loss: 0.8129 - val_accuracy: 0.7080
Epoch 9/50
30/30 [==============================] - 2s 58ms/step - loss: 0.8688 - accuracy: 0.6962 - val_loss: 0.7994 - val_accuracy: 0.7140
Epoch 10/50
30/30 [==============================] - 2s 58ms/step - loss: 0.8445 - accuracy: 0.7040 - val_loss: 0.7732 - val_accuracy: 0.7380
Epoch 11/50
30/30 [==============================] - 2s 59ms/step - loss: 0.8294 - accuracy: 0.7101 - val_loss: 0.7920 - val_accuracy: 0.7180
Epoch 12/50
30/30 [==============================] - 2s 60ms/step - loss: 0.8061 - accuracy: 0.7162 - val_loss: 0.7214 - val_accuracy: 0.7560
Epoch 13/50
30/30 [==============================] - 2s 59ms/step - loss: 0.7826 - accuracy: 0.7210 - val_loss: 0.7043 - val_accuracy: 0.7480
Epoch 14/50
30/30 [==============================] - 2s 62ms/step - loss: 0.7565 - accuracy: 0.7341 - val_loss: 0.6963 - val_accuracy: 0.7720
Epoch 15/50
30/30 [==============================] - 2s 63ms/step - loss: 0.7378 - accuracy: 0.7420 - val_loss: 0.6678 - val_accuracy: 0.7720
Epoch 16/50
30/30 [==============================] - 2s 62ms/step - loss: 0.7085 - accuracy: 0.7505 - val_loss: 0.6184 - val_accuracy: 0.7920
Epoch 17/50
30/30 [==============================] - 2s 63ms/step - loss: 0.6850 - accuracy: 0.7598 - val_loss: 0.6069 - val_accuracy: 0.7980
Epoch 18/50
30/30 [==============================] - 2s 63ms/step - loss: 0.6627 - accuracy: 0.7675 - val_loss: 0.5936 - val_accuracy: 0.7920
Epoch 19/50
30/30 [==============================] - 2s 64ms/step - loss: 0.6368 - accuracy: 0.7753 - val_loss: 0.5850 - val_accuracy: 0.8080
Epoch 20/50
30/30 [==============================] - 2s 66ms/step - loss: 0.6187 - accuracy: 0.7832 - val_loss: 0.5441 - val_accuracy: 0.8300
Epoch 21/50
30/30 [==============================] - 2s 66ms/step - loss: 0.5921 - accuracy: 0.7946 - val_loss: 0.5243 - val_accuracy: 0.8200
Epoch 22/50
30/30 [==============================] - 2s 67ms/step - loss: 0.5699 - accuracy: 0.8024 - val_loss: 0.5142 - val_accuracy: 0.8380
Epoch 23/50
30/30 [==============================] - 2s 68ms/step - loss: 0.5629 - accuracy: 0.8050 - val_loss: 0.5019 - val_accuracy: 0.8380
Epoch 24/50
30/30 [==============================] - 2s 71ms/step - loss: 0.5351 - accuracy: 0.8117 - val_loss: 0.4878 - val_accuracy: 0.8440
Epoch 25/50
30/30 [==============================] - 2s 73ms/step - loss: 0.5150 - accuracy: 0.8217 - val_loss: 0.4504 - val_accuracy: 0.8520
Epoch 26/50
30/30 [==============================] - 2s 72ms/step - loss: 0.5045 - accuracy: 0.8254 - val_loss: 0.4324 - val_accuracy: 0.8660
Epoch 27/50
30/30 [==============================] - 2s 72ms/step - loss: 0.4884 - accuracy: 0.8288 - val_loss: 0.4245 - val_accuracy: 0.8680
Epoch 28/50
30/30 [==============================] - 2s 78ms/step - loss: 0.4706 - accuracy: 0.8371 - val_loss: 0.4198 - val_accuracy: 0.8720
Epoch 29/50
30/30 [==============================] - 2s 73ms/step - loss: 0.4604 - accuracy: 0.8408 - val_loss: 0.4130 - val_accuracy: 0.8780
Epoch 30/50
30/30 [==============================] - 2s 75ms/step - loss: 0.4470 - accuracy: 0.8439 - val_loss: 0.3891 - val_accuracy: 0.8700
Epoch 31/50
30/30 [==============================] - 2s 76ms/step - loss: 0.4374 - accuracy: 0.8489 - val_loss: 0.3718 - val_accuracy: 0.8820
Epoch 32/50
30/30 [==============================] - 2s 78ms/step - loss: 0.4219 - accuracy: 0.8524 - val_loss: 0.3558 - val_accuracy: 0.8820
Epoch 33/50
30/30 [==============================] - 2s 74ms/step - loss: 0.4091 - accuracy: 0.8564 - val_loss: 0.3517 - val_accuracy: 0.8840
Epoch 34/50
30/30 [==============================] - 2s 69ms/step - loss: 0.3996 - accuracy: 0.8613 - val_loss: 0.3375 - val_accuracy: 0.8900
Epoch 35/50
30/30 [==============================] - 2s 69ms/step - loss: 0.3934 - accuracy: 0.8636 - val_loss: 0.3328 - val_accuracy: 0.8900
Epoch 36/50
30/30 [==============================] - 2s 70ms/step - loss: 0.3801 - accuracy: 0.8679 - val_loss: 0.3173 - val_accuracy: 0.8940
Epoch 37/50
30/30 [==============================] - 2s 70ms/step - loss: 0.3706 - accuracy: 0.8710 - val_loss: 0.3334 - val_accuracy: 0.8900
Epoch 38/50
30/30 [==============================] - 2s 70ms/step - loss: 0.3678 - accuracy: 0.8726 - val_loss: 0.3030 - val_accuracy: 0.9000
Epoch 39/50
30/30 [==============================] - 2s 70ms/step - loss: 0.3644 - accuracy: 0.8736 - val_loss: 0.3016 - val_accuracy: 0.9020
Epoch 40/50
30/30 [==============================] - 2s 69ms/step - loss: 0.3460 - accuracy: 0.8807 - val_loss: 0.2901 - val_accuracy: 0.9120
Epoch 41/50
30/30 [==============================] - 2s 70ms/step - loss: 0.3386 - accuracy: 0.8817 - val_loss: 0.2743 - val_accuracy: 0.9100
Epoch 42/50
30/30 [==============================] - 2s 72ms/step - loss: 0.3285 - accuracy: 0.8837 - val_loss: 0.2685 - val_accuracy: 0.9100
Epoch 43/50
30/30 [==============================] - 2s 71ms/step - loss: 0.3129 - accuracy: 0.8904 - val_loss: 0.2672 - val_accuracy: 0.9120
Epoch 44/50
30/30 [==============================] - 2s 70ms/step - loss: 0.3006 - accuracy: 0.8954 - val_loss: 0.2545 - val_accuracy: 0.9260
Epoch 45/50
30/30 [==============================] - 2s 71ms/step - loss: 0.2983 - accuracy: 0.8950 - val_loss: 0.2419 - val_accuracy: 0.9180
Epoch 46/50
30/30 [==============================] - 2s 69ms/step - loss: 0.2902 - accuracy: 0.8988 - val_loss: 0.2407 - val_accuracy: 0.9300
Epoch 47/50
30/30 [==============================] - 2s 71ms/step - loss: 0.2807 - accuracy: 0.9024 - val_loss: 0.2293 - val_accuracy: 0.9280
Epoch 48/50
30/30 [==============================] - 2s 69ms/step - loss: 0.2716 - accuracy: 0.9050 - val_loss: 0.2180 - val_accuracy: 0.9200
Epoch 49/50
30/30 [==============================] - 2s 70ms/step - loss: 0.2621 - accuracy: 0.9084 - val_loss: 0.2081 - val_accuracy: 0.9260
Epoch 50/50
30/30 [==============================] - 2s 71ms/step - loss: 0.2549 - accuracy: 0.9105 - val_loss: 0.2101 - val_accuracy: 0.9400

6. Plot Accuracy and loss of the traning the model

In [26]:
import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
In [27]:
import matplotlib.pyplot as plt

loss = history.history['accuracy']
val_loss = history.history['val_accuracy']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'r', label='Training Accuracy')
plt.plot(epochs, val_loss, 'b', label='Validation Accuracy')
plt.title('Training and validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

7. Validation

In [28]:
# validation
total, correct, false = 0, 0, 0
# print(len(X_val))
for x in range(len(X_val)):
    total += 1
#     print(x)

    result = model.predict(X_val[x].reshape(1, X_test.shape[1]), batch_size=1)[0]
#     print(np.argmax(result), np.argmax(Y_val[x]))

    if np.argmax(result) == np.argmax(Y_val[x]):
        correct += 1

    else:
        false += 1
print("accuracy", correct / total * 100, "%")
# print("negative accuracy", neg_correct / negative_count * 100, "%")
accuracy 94.0 %

8. Making predictions

In [29]:
data=pd.read_csv('preprocessed_data.csv')
data['text']=data['text'].str.strip()

data = data.dropna()
data = data.reset_index(drop=True)
data.drop(data[data['text'].str.count(" ") < 1].index , inplace=True)
data.reset_index(drop=True)

# tokenization
max_words = 2000
tokenizer = Tokenizer(num_words=max_words, split=' ')
tokenizer.fit_on_texts(data['text'].values)
X = tokenizer.texts_to_sequences(data['text'].values)

X = pad_sequences(X, maxlen=32)
print(X.shape[1])

#prediction on given data
em=[]
for i in range(len(X)):
    result = model.predict(X[i].reshape(1,X_test.shape[1]), batch_size=1)[0]
    emotion_value = np.argmax(result)
    emotion = dict_label[emotion_value]
    em.append(emotion)

se = pd.Series(em)
data['emotion']=se.values
data['emotion']

ls=list(set(data['emotion']))
ls
for i in ls:
    em_type=data[data['emotion']==i]
    print(i,"Percentage: ",(len(em_type)/len(data))*100)
32
Surprise Percentage:  5.747600866783613
Happy Percentage:  29.27458466618512
Fear Percentage:  14.993292745846661
Sad Percentage:  18.563615725931275
Powerful Percentage:  1.8986688680218762
Angry Percentage:  11.9595500980291
Peaceful Percentage:  1.351769683211227
Shame Percentage:  16.210917345991128

9. Saving the Prediction

In [30]:
print("new data started writting in new csv file preprocessed_data.csv...")
#write clean data to new file
data.to_csv('predicted_new.csv', index=False, encoding="utf-8")
print("clean data is written on preprocessed_data.csv")

print ("total records",len(data))
new data started writting in new csv file preprocessed_data.csv...
clean data is written on preprocessed_data.csv
total records 9691

3. Analysis and Visualisations

Import libraries

In [31]:
import pandas as pd
import numpy as np
import csv
import re #regular expression
from textblob import TextBlob
import string
import preprocessor as p
import nltk
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.porter import * 
from PIL import Image
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import datetime
import calendar
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import itertools
import collections
from collections import Counter
from palettable.colorbrewer.qualitative import Pastel1_7
import matplotlib.cbook as cbook
#plots
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from plotly.subplots import make_subplots

Load data

In [32]:
df = pd.read_csv('predicted_new.csv')
country_code=pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_world_gdp_with_codes.csv') 

Preprocess

In [33]:
c_name=list(country_code.COUNTRY)
lc_name=country_code.COUNTRY.str.lower()
df = pd.read_csv('predicted_new.csv')
df['location'] = df['location'].str.lower()
df.loc[df['location'].str.contains('india'), 'location'] = 'India'
df.loc[df['location'].str.contains('In'), 'location'] = 'India'
df.loc[df['location'].str.contains('IN'), 'location'] = 'India'
df.loc[df['location'].str.contains('in'), 'location'] = 'India'
df.loc[df['location'].str.contains('UP'), 'location'] = 'India'


for i in range(len(lc_name)):
    df.loc[df['location'].str.contains(lc_name[i]), 'location'] = c_name[i]

cs=set(country_code.COUNTRY)
mcs=set(df.location)
#print(len(set(with_country_name.location)))
with_country_name=df[df['location'].isin(list(country_code['COUNTRY']))]
with_country_name.location
with_country_name['count']= with_country_name.location.map(with_country_name.location.value_counts())
all_tweet_location=pd.merge(with_country_name,country_code,left_on="location",right_on="COUNTRY",how="left")

unique_count=all_tweet_location[['location','count','CODE']]
#with_country_name.count
unique_count =unique_count.drop_duplicates()
#reset index after dropping
unique_count = unique_count.reset_index(drop=True)
unique_count =unique_count.nlargest(10,['count'])

total=df['emotion'].count()
df_happy=df[df['emotion']=='Happy'].count()
df_sad=df[df['emotion']=='Sad'].count()
df_angry=df[df['emotion']=='Angry'].count()
df_shame=df[df['emotion']=='Shame'].count()
df_fear=df[df['emotion']=='Fear'].count()
df_surprise=df[df['emotion']=='Surprise'].count()

df.head()
c:\program files\python37\lib\site-packages\pandas\core\strings.py:1952: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
  return func(self, *args, **kwargs)
Out[33]:
text location date time hashtags mention emotion
0 curve flattening new virus via ate corona mone... India 22-Sep 5:08:45 [] ['@thestarkenya', '@MOH_Kenya'] Sad
1 covid trend map rising falling Iraq 22-Sep 5:08:34 [] [] Happy
2 covid trend map rising falling Australia 22-Sep 5:08:33 [] [] Happy
3 bake day raising medium light skin tone else t... Australia 22-Sep 5:06:02 [] [] Shame
4 liberal party force even though even covid pos... fareham 22-Sep 5:05:34 [] ['@DanielAndrewsMP'] Fear
In [34]:
country_code.head()
Out[34]:
COUNTRY GDP (BILLIONS) CODE
0 Afghanistan 21.71 AFG
1 Albania 13.40 ALB
2 Algeria 227.80 DZA
3 American Samoa 0.75 ASM
4 Andorra 4.80 AND

1.Top 10 Countries with max number of tweets

In [35]:
fig = go.Figure(go.Bar(
    x=unique_count['location'][0:],y=unique_count['count'],
    marker={'color': unique_count['count'][:10], 
    'colorscale': 'blues'},  
    text=unique_count['count'][:10],
    textposition = "outside",
))
fig.update_layout(title_text='Top Countries with most tweets',xaxis_title="Countries",
                  yaxis_title="Number of Tweets",template="plotly_dark",height=700,title_x=0.5)

fig.show()

2. Map of worldwide tweets counts

In [36]:
df_happy=all_tweet_location[all_tweet_location['emotion']=='Happy']
df_sad=all_tweet_location[all_tweet_location['emotion']=='Sad']
df_angry=all_tweet_location[all_tweet_location['emotion']=='Angry']
df_shame=all_tweet_location[all_tweet_location['emotion']=='Shame']
df_fear=all_tweet_location[all_tweet_location['emotion']=='Fear']

fig = go.Figure(data=go.Choropleth(
    locations = all_tweet_location['CODE'],
    z = all_tweet_location['count'],
    text = all_tweet_location['location'],
    colorscale = 'rainbow', 
    autocolorscale=False,
    reversescale=False,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = '# of Tweets',
))

fig.update_layout(
#     title_text='Tweets over the world - ({} - {})'.format(df['date'].sort_values()[0].strptime("%d/%m/%Y"),
#                                                        df['date'].sort_values().iloc[-1].strptime("%d/%m/%Y")),title_x=0.5,
    title_text='Tweets over the world - ({} - {})'.format(all_tweet_location.date.min(),all_tweet_location.date.max()),title_x=0.5,

    geo=dict(
        showframe=True,
        showcoastlines=False,
        projection_type='equirectangular',
    )
)


fig.show()

Average length of tweets over different emotion sentiments

In [37]:
df['text_length']=df['text'].str.split(" ").str.len()
print("Average length of Happy Emotion Sentiment tweets : {}".format(round(df[df['emotion']=='Happy']['text_length'].mean(),2)))
print("Average length of Sad Emotion Sentiment tweets : {}".format(round(df[df['emotion']=='Sad']['text_length'].mean(),2)))
print("Average length of Angry Emotion Sentiment tweets : {}".format(round(df[df['emotion']=='Angry']['text_length'].mean(),2)))
print("Average length of Shame Emotion Sentiment tweets : {}".format(round(df[df['emotion']=='Shame']['text_length'].mean(),2)))
print("Average length of Fear Emotion Sentiment tweets : {}".format(round(df[df['emotion']=='Fear']['text_length'].mean(),2)))
Average length of Happy Emotion Sentiment tweets : 12.69
Average length of Sad Emotion Sentiment tweets : 10.77
Average length of Angry Emotion Sentiment tweets : 12.85
Average length of Shame Emotion Sentiment tweets : 13.63
Average length of Fear Emotion Sentiment tweets : 14.07

3. N grams words over tweets

over the worldwide tweets

In [38]:
df.text=df.text.str.strip()
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
def ngram_df(corpus,nrange,n=None):
    vec = CountVectorizer(stop_words = 'english',ngram_range=nrange).fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    total_list=words_freq[:n]
    df=pd.DataFrame(total_list,columns=['text','count'])
    return df
unigram_df=ngram_df(df.text,(1,1),20)
bigram_df=ngram_df(df.text,(2,2),20)
trigram_df=ngram_df(df.text,(3,3),20)
In [39]:
fig = make_subplots(
    rows=3, cols=1,subplot_titles=("Unigram over worldwide tweets","Bigram over worldwide tweets",'Trigram over worldwide tweets'),
    specs=[[{"type": "scatter"}],
           [{"type": "scatter"}],
           [{"type": "scatter"}]
          ])

fig.add_trace(go.Bar(
    y=unigram_df['text'][::-1],
    x=unigram_df['count'][::-1],
    marker={'color': "blue"},  
    text=unigram_df['count'],
    textposition = "outside",
    orientation="h",
    name="Months",
),row=1,col=1)

fig.add_trace(go.Bar(
    y=bigram_df['text'][::-1],
    x=bigram_df['count'][::-1],
    marker={'color': "blue"},  
    text=bigram_df['count'],
     name="Days",
    textposition = "outside",
    orientation="h",
),row=2,col=1)

fig.add_trace(go.Bar(
    y=trigram_df['text'][::-1],
    x=trigram_df['count'][::-1],
    marker={'color': "blue"},  
    text=trigram_df['count'],
     name="Days",
    orientation="h",
    textposition = "outside",
),row=3,col=1)

fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_layout(title_text='Top N Grams',xaxis_title=" ",yaxis_title=" ",
                  showlegend=False,title_x=0.5,height=1200,template="plotly_dark")
fig.show()

over the India tweets

In [40]:
df_in=df[df.location=='India']
unigram_df=ngram_df(df_in.text,(1,1),20)
bigram_df=ngram_df(df_in.text,(2,2),20)
trigram_df=ngram_df(df_in.text,(3,3),20)
fig = make_subplots(
    rows=3, cols=1,subplot_titles=("Unigram over India tweets","Bigram over India tweets",'Trigram over India tweets'),
    specs=[[{"type": "scatter"}],
           [{"type": "scatter"}],
           [{"type": "scatter"}]
          ])

fig.add_trace(go.Bar(
    y=unigram_df['text'][::-1],
    x=unigram_df['count'][::-1],
    marker={'color': "blue"},  
    text=unigram_df['count'],
    textposition = "outside",
    orientation="h",
    name="Months",
),row=1,col=1)

fig.add_trace(go.Bar(
    y=bigram_df['text'][::-1],
    x=bigram_df['count'][::-1],
    marker={'color': "blue"},  
    text=bigram_df['count'],
     name="Days",
    textposition = "outside",
    orientation="h",
),row=2,col=1)

fig.add_trace(go.Bar(
    y=trigram_df['text'][::-1],
    x=trigram_df['count'][::-1],
    marker={'color': "blue"},  
    text=trigram_df['count'],
     name="Days",
    orientation="h",
    textposition = "outside",
),row=3,col=1)

fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_layout(title_text='Top N Grams',xaxis_title=" ",yaxis_title=" ",
                  showlegend=False,title_x=0.5,height=1200,template="plotly_dark")
fig.show()

4. Violin Plot of Text Length Distribution

Distribution of Text length Over the Worlwide tweets

In [41]:
fig = go.Figure(data=go.Violin(y=df['text_length'], box_visible=True, line_color='black',
                               meanline_visible=True, fillcolor='royalblue', opacity=0.6,
                               x0='Tweet Text Length'))

fig.update_layout(yaxis_zeroline=False,title="Distribution of Text length over worldwide tweets",template='ggplot2')
fig.show()

Distribution of Text length Over the India tweets

In [42]:
fig = go.Figure(data=go.Violin(y=df_in['text_length'], box_visible=True, line_color='black',
                               meanline_visible=True, fillcolor='royalblue', opacity=0.6,
                               x0='Tweet Text Length'))

fig.update_layout(yaxis_zeroline=False,title="Distribution of Text length over India Tweets",template='ggplot2')
fig.show()

5. Violin Plot of Emotion Sentiment Text Length Distribution

Distribution of Emotion Sentiment Text length Over the Worlwide tweets

In [43]:
fig = go.Figure()
fig.add_trace(go.Violin(y=df[df['emotion']=='Happy']['text_length'],fillcolor='yellow', opacity=0.6,name="Happy",
                               x0='Happy')
             )


fig.add_trace(go.Violin(y=df[df['emotion']=='Sad']['text_length'], line_color='black',
                               fillcolor='blue', opacity=0.6,name="Sad",
                               x0='Sad')
             )

fig.add_trace(go.Violin(y=df[df['emotion']=='Angry']['text_length'], line_color='black',
                               fillcolor='red', opacity=0.6,name="Angry",
                               x0='Angry')
             )

fig.add_trace(go.Violin(y=df[df['emotion']=='Shame']['text_length'], line_color='black',
                               fillcolor='grey', opacity=0.6,name="Shame",
                               x0='Shame')
             )

fig.add_trace(go.Violin(y=df[df['emotion']=='Fear']['text_length'], line_color='black',
                               fillcolor='purple', opacity=0.6,name="Fear",
                               x0='Fear')
             )
fig.update_layout(title_text="Violin - Tweet Length over worldwide",title_x=0.5)

fig.show()

Distribution of Emotion Sentiment Text length Over the India tweets

In [44]:
df_in=df[df.location=='India']
fig = go.Figure()
fig.add_trace(go.Violin(y=df_in[df_in['emotion']=='Happy']['text_length'],fillcolor='yellow', opacity=0.6,name="Happy",
                               x0='Happy')
             )


fig.add_trace(go.Violin(y=df_in[df_in['emotion']=='Sad']['text_length'], line_color='black',
                               fillcolor='blue', opacity=0.6,name="Sad",
                               x0='Sad')
             )

fig.add_trace(go.Violin(y=df_in[df_in['emotion']=='Angry']['text_length'], line_color='black',
                               fillcolor='red', opacity=0.6,name="Angry",
                               x0='Angry')
             )

fig.add_trace(go.Violin(y=df_in[df_in['emotion']=='Shame']['text_length'], line_color='black',
                               fillcolor='grey', opacity=0.6,name="Shame",
                               x0='Shame')
             )

fig.add_trace(go.Violin(y=df_in[df_in['emotion']=='Fear']['text_length'], line_color='black',
                               fillcolor='purple', opacity=0.6,name="Fear",
                               x0='Fear')
             )
fig.update_layout(title_text="Violin - Tweet Length over india tweets emotion sentiments",title_x=0.5)

fig.show()
In [45]:
words=df['mention'].tolist()
wd=[]
for i in words:
    if len(i)>4:
        i=i.replace('[]','')
        i=i.replace('[','')
        i=i.replace(']','')
        i=i.replace('@','')
        i=i.replace("'",'')
        i=i.replace(' ','')
        i=i.replace('  ','')
        i=i.strip()
        if len(i.split())>0:
            wd+=i.split(',')
        else: 
            continue

mention=Counter(wd).keys() # equals to list(set(words))
count=Counter(wd).values() # counts the elements' frequency
d = {'mention': list(mention), 'count': list(count)}
dt = pd.DataFrame(data=d)

dt =dt.nlargest(11,['count'])

x=list(dt['count'])
x.reverse()
y=list(dt['mention'])
y.reverse()
list(dt['count'])
fig = go.Figure(go.Bar(
            x=x,
            y=y,
            orientation='h'))
fig.show()
In [46]:
df_in=df[df['location']=='India']
words= df_in['mention'].tolist()
wd=[]
for i in words:
    if len(i)>4:
        i=i.replace('[','')
        i=i.replace(']','')
        i=i.replace('@','')
        i=i.replace("'",'')
        i=i.replace(' ','')
        i=i.replace('  ','')
        i=i.strip()
        if len(i.split())>0:
            wd+=i.split(',')
        else: 
            continue

mention=Counter(wd).keys() # equals to list(set(words))
count=Counter(wd).values() # counts the elements' frequency

d = {'mention': list(mention), 'count': list(count)}
dt = pd.DataFrame(data=d)
dt =dt.nlargest(10,['count'])
x=list(dt['count'])
x.reverse()
y=list(dt['mention'])
y.reverse()
list(dt['count'])
fig = go.Figure(go.Bar(
            x=x,
            y=y,
            orientation='h'))

fig.show()

7. Tweets counts of emotion sentiments

Tweets counts of different emotion over the worldwide

In [47]:
df_happy=df[df['emotion']=='Happy'].count()
df_sad=df[df['emotion']=='Sad'].count()
df_angry=df[df['emotion']=='Angry'].count()
df_shame=df[df['emotion']=='Shame'].count()
df_fear=df[df['emotion']=='Fear'].count()
df_surprise=df[df['emotion']=='Surprise'].count()
width = 0.50  
ind = np.arange(4) 
df2 = df.groupby(df["emotion"],as_index=False).count()
plt.style.use('ggplot')
plt.figure(figsize=(10,8))
plt.bar(["Happy","Sad","Angry","Shame","Fear"],[df_happy.text,df_sad.text,df_angry.text,df_shame.text,df_fear.text],width,alpha=0.5 ,color = ["green","blue","red",'grey', 'lightcoral'])
plt.title("Tweet counts of Different Emotion over the worldwide") 
plt.ylabel("Frequency")
plt.xlabel("Sentiments")
plt.show()

Tweets counts of different emotion in India

In [48]:
## plotting the data based on the count of positive , negative and neutral feedback
df_in=df[df['location']=='India']
df_happy=df_in[df_in['emotion']=='Happy'].count()
df_sad=df_in[df_in['emotion']=='Sad'].count()
df_angry=df_in[df_in['emotion']=='Angry'].count()
df_shame=df_in[df_in['emotion']=='Shame'].count()
df_fear=df_in[df_in['emotion']=='Fear'].count()
df_surprise=df[df['emotion']=='Surprise'].count()
width = 0.50  
ind = np.arange(4) 
df2 = df.groupby(df["emotion"],as_index=False).count()
plt.style.use('ggplot')
plt.figure(figsize=(10,8))
plt.bar(["Happy","Sad","Angry","Shame","Fear"],[df_happy.text,df_sad.text,df_angry.text,df_shame.text,df_fear.text],width,alpha=0.5 ,color = ["green","blue","red",'grey', 'lightcoral'])
plt.title("Tweet counts of Different Emotion over the India") 
plt.ylabel("Frequency")
plt.xlabel("Sentiments")
plt.show()

8. Pie plot of percentage of different emotion sentiments of the tweets

Emotion sentiments percentage of world tweets

In [49]:
#pie plot
df_happy=df[df['emotion']=='Happy'].count()
df_sad=df[df['emotion']=='Sad'].count()
df_angry=df[df['emotion']=='Angry'].count()
df_shame=df[df['emotion']=='Shame'].count()
df_fear=df[df['emotion']=='Fear'].count()
df_surprise=df[df['emotion']=='Surprise'].count()
labels = 'Happy','Sad', 'Angry', 'Shame', 'Fear'
sizes = [df_happy.text,df_sad.text,df_angry.text,df_shame.text,df_fear.text]
colors = ['green','blue', 'red','grey', 'lightcoral']
df['count']= 1
fig = px.pie(df,title='Emotion percentage Distribution of tweets over worlds ', values='count', names='emotion',labels='emotion')
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

Emotion sentiments percentage of India

In [50]:
df_in=df[df['location']=='India']
df_happy=df_in[df_in['emotion']=='Happy'].count()
df_sad=df_in[df_in['emotion']=='Sad'].count()
df_angry=df_in[df_in['emotion']=='Angry'].count()
df_shame=df_in[df_in['emotion']=='Shame'].count()
df_fear=df_in[df_in['emotion']=='Fear'].count()
df_surprise=df[df['emotion']=='Surprise'].count()
#pie plot

labels = 'Happy','Sad', 'Angry', 'Shame', 'Fear'
sizes = [df_happy.text,df_sad.text,df_angry.text,df_shame.text,df_fear.text]
colors = ['green','blue', 'red','grey', 'lightcoral']
df['count']= 1
fig = px.pie(df_in,title='Emotion percentage Distribution of tweets over India ', values='count', names='emotion',labels='emotion')
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

9.Funnel % emotion sentiment distribution visualisation

% emotion sentiment distribution visualisation of world tweets

In [51]:
temp = df.groupby('emotion').count()['text'].reset_index().sort_values(by='text',ascending=False)
fig = go.Figure(go.Funnelarea(
    text =temp.emotion,
    values = temp.text,
    title = {"position": "top center", "text": "Funnel-Chart of Emotion Distribution in tweets"}
    ))
fig.show()

% emotion sentiment distribution visualisation of India tweets

In [52]:
temp = df_in.groupby('emotion').count()['text'].reset_index().sort_values(by='text',ascending=False)
fig = go.Figure(go.Funnelarea(
    text =temp.emotion,
    values = temp.text,
    title = {"position": "top center", "text": "Funnel-Chart of Emotion Distribution in tweets"}
    ))
fig.show()

10. Country wise distribution of the emotion sentiments of top 10 countries

In [53]:
df_happy=df[df['emotion']=='Happy']
df_sad=df[df['emotion']=='Sad']
df_angry=df[df['emotion']=='Angry']
df_shame=df[df['emotion']=='Shame']
df_fear=df[df['emotion']=='Fear']

x=list(unique_count.location)
h=[]
s=[]
a=[]
sh=[]
f=[]
for i in unique_count.location:
    tempd=all_tweet_location[all_tweet_location['location']==i]
    h.append(len(tempd[tempd.emotion=='Happy']))
    s.append(len(tempd[tempd.emotion=='Sad']))
    a.append(len(tempd[tempd.emotion=='Angry']))
    sh.append(len(tempd[tempd.emotion=='Shame']))
    f.append(len(tempd[tempd.emotion=='Fear']))
fig = go.Figure(go.Bar(x=x, y=h, name='Happy'))
fig.add_trace(go.Bar(x=x, y=s, name='Sad'))
fig.add_trace(go.Bar(x=x, y=a, name='Angry'))
fig.add_trace(go.Bar(x=x, y=sh, name='Shame'))
fig.add_trace(go.Bar(x=x, y=f, name='Fear'))
fig.update_layout(barmode='stack')
fig.update_xaxes(categoryorder='array', categoryarray= ['d','a','c','b'])
fig.show()

11.Emotion sentiment distribution of tweets in India

In [54]:
df_in=df[df.location=='India']
df_happy=df_in[df_in['emotion']=='Happy']
df_sad=df_in[df_in['emotion']=='Sad']
df_angry=df_in[df_in['emotion']=='Angry']
df_shame=df_in[df_in['emotion']=='Shame']
df_fear=df_in[df_in['emotion']=='Fear']

x=list(['India'])
h=[]
s=[]
a=[]
sh=[]
f=[]
tempd=all_tweet_location[all_tweet_location['location']=='India']
h.append(len(tempd[tempd.emotion=='Happy']))
s.append(len(tempd[tempd.emotion=='Sad']))
a.append(len(tempd[tempd.emotion=='Angry']))
sh.append(len(tempd[tempd.emotion=='Shame']))
f.append(len(tempd[tempd.emotion=='Fear']))
fig = go.Figure(go.Bar(x=x, y=h, name='Happy'))
fig.add_trace(go.Bar(x=x, y=s, name='Sad'))
fig.add_trace(go.Bar(x=x, y=a, name='Angry'))
fig.add_trace(go.Bar(x=x, y=sh, name='Shame'))
fig.add_trace(go.Bar(x=x, y=f, name='Fear'))
fig.update_layout(barmode='stack')
fig.update_xaxes(categoryorder='array', categoryarray= ['d','a','c','b'])
fig.show()

12. Word Cloud of different emotions setiments words in World

In [55]:
df_happy=df[df['emotion']=='Happy']
df_sad=df[df['emotion']=='Sad']
df_angry=df[df['emotion']=='Angry']
df_shame=df[df['emotion']=='Shame']
df_fear=df[df['emotion']=='Fear']
df_surprise=df[df['emotion']=='Surprise']
from wordcloud import WordCloud


happy=""
sad=""
angry=""
shame=""
fear=""
for i in df_happy.text:
    happy+=i.strip()+" "
for i in df_sad.text:
    sad+=i.strip()+" "
for i in df_angry.text:
    angry+=i.strip()+" "
for i in df_shame.text:
    shame+=i.strip()+" "
for i in df_fear.text:
    fear+=i.strip()+" "
#print(s)

happy_split=happy.split()
sad_split=sad.split()
angry_split=angry.split()
shame_split=shame.split()
fear_split=fear.split()
common=set(happy_split) & set(sad_split) & set(angry_split) & set(shame_split) & set(fear_split)

happy_txt  = [word for word in happy_split if word not in common]
happy_txt = ' '.join(happy_txt)
sad_txt  = [word for word in sad_split if word not in common]
sad_txt = ' '.join(sad_txt)
angry_txt  = [word for word in angry_split if word not in common]
angry_txt = ' '.join(angry_txt)
shame_txt  = [word for word in shame_split if word not in common]
shame_txt = ' '.join(shame_txt)
fear_txt  = [word for word in fear_split if word not in common]
fear_txt = ' '.join(fear_txt)
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
def PlotWordCloud(words, title):
    wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white' 
                ).generate(words) 
                                                           
    # plot the WordCloud image                        
    plt.figure(figsize = (10, 10), facecolor = None) 
    plt.imshow(wordcloud) 
    plt.axis("off") 
    plt.tight_layout(pad = 0) 
    plt.title(title, fontsize=50)

    plt.show() 

PlotWordCloud(happy_txt, 'Most Happy tweet words')

 
PlotWordCloud(sad_txt, 'Most Sad tweet words')
 
PlotWordCloud(angry_txt, 'Most Angry tweet words')
   
PlotWordCloud(shame_txt, 'Most Shame tweet words')
  
PlotWordCloud(fear_txt, 'Most Fear tweet words')

13. Word Cloud of different emotions setiments words in India

In [56]:
df=df[df['location']=='India']
df_happy=df[df['emotion']=='Happy']
df_sad=df[df['emotion']=='Sad']
df_angry=df[df['emotion']=='Angry']
df_shame=df[df['emotion']=='Shame']
df_fear=df[df['emotion']=='Fear']
df_surprise=df[df['emotion']=='Surprise']
from wordcloud import WordCloud


happy=""
sad=""
angry=""
shame=""
fear=""
for i in df_happy.text:
    happy+=i.strip()+" "
for i in df_sad.text:
    sad+=i.strip()+" "
for i in df_angry.text:
    angry+=i.strip()+" "
for i in df_shame.text:
    shame+=i.strip()+" "
for i in df_fear.text:
    fear+=i.strip()+" "
#print(s)

happy_split=happy.split()
sad_split=sad.split()
angry_split=angry.split()
shame_split=shame.split()
fear_split=fear.split()
common=set(happy_split) & set(sad_split) & set(angry_split) & set(shame_split) & set(fear_split)

happy_txt  = [word for word in happy_split if word not in common]
happy_txt = ' '.join(happy_txt)
sad_txt  = [word for word in sad_split if word not in common]
sad_txt = ' '.join(sad_txt)
angry_txt  = [word for word in angry_split if word not in common]
angry_txt = ' '.join(angry_txt)
shame_txt  = [word for word in shame_split if word not in common]
shame_txt = ' '.join(shame_txt)
fear_txt  = [word for word in fear_split if word not in common]
fear_txt = ' '.join(fear_txt)
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
def PlotWordCloud(words, title):
    wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white' 
                ).generate(words) 
                                                           
    # plot the WordCloud image                        
    plt.figure(figsize = (10, 10), facecolor = None) 
    plt.imshow(wordcloud) 
    plt.axis("off") 
    plt.tight_layout(pad = 0) 
    plt.title(title, fontsize=50)

    plt.show() 

PlotWordCloud(happy_txt, 'Most Happy tweet words')

 
PlotWordCloud(sad_txt, 'Most Sad tweet words')
 
PlotWordCloud(angry_txt, 'Most Angry tweet words')
   
PlotWordCloud(shame_txt, 'Most Shame tweet words')
  
PlotWordCloud(fear_txt, 'Most Fear tweet words')

14. Word Cloud of different emotions setiments Hastags in worlds tweets

In [57]:
df['hashtags'] = df['hashtags'].str.replace('[^\w\s]',' ').str.replace('\s\s+', ' ')

df_happy=df[df['emotion']=='Happy']
df_sad=df[df['emotion']=='Sad']
df_angry=df[df['emotion']=='Angry']
df_shame=df[df['emotion']=='Shame']
df_fear=df[df['emotion']=='Fear']
df_surprise=df[df['emotion']=='Surprise']
from wordcloud import WordCloud


happy=""
sad=""
angry=""
shame=""
fear=""
for i in df_happy.hashtags:
    happy+=i.strip()+" "
for i in df_sad.hashtags:
    sad+=i.strip()+" "
for i in df_angry.hashtags:
    angry+=i.strip()+" "
for i in df_shame.hashtags:
    shame+=i.strip()+" "
for i in df_fear.hashtags:
    fear+=i.strip()+" "
#print(s)

happy_split=happy.split()
sad_split=sad.split()
angry_split=angry.split()
shame_split=shame.split()
fear_split=fear.split()
common=set(happy_split) & set(sad_split) & set(angry_split) & set(shame_split) & set(fear_split)

happy_hashtag  = [word for word in happy_split if word not in common]
happy_hashtag = ' '.join(happy_hashtag)
sad_hashtag  = [word for word in sad_split if word not in common]
sad_hashtag = ' '.join(sad_hashtag)
angry_hashtag  = [word for word in angry_split if word not in common]
angry_hashtag = ' '.join(angry_hashtag)
shame_hashtag  = [word for word in shame_split if word not in common]
shame_hashtag = ' '.join(shame_hashtag)
fear_hashtag  = [word for word in fear_split if word not in common]
fear_hashtag = ' '.join(fear_hashtag)
# lower max_font_size, change the maximum number of word and lighten the background:
def PlotWordCloud(words, title):
    wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white' 
                ).generate(words) 
                                                           
    # plot the WordCloud image                        
    plt.figure(figsize = (10, 10), facecolor = None) 
    plt.imshow(wordcloud) 
    plt.axis("off") 
    plt.tight_layout(pad = 0) 
    plt.title(title, fontsize=50)

    plt.show() 
    
    
PlotWordCloud(happy_hashtag, 'Most Happy hashtags')
PlotWordCloud(sad_hashtag, 'Most Sad hashtags')
PlotWordCloud(angry_hashtag, 'Most Angry hashtags')
PlotWordCloud(shame_hashtag, 'Most shame hashtags')
PlotWordCloud(fear_hashtag, 'Most Fear hashtags')

15. Word Cloud of different emotions setiments Hastags in India tweets

In [58]:
df['hashtags'] = df['hashtags'].str.replace('[^\w\s]',' ').str.replace('\s\s+', ' ')
df=df[df['location']=='India']
df_happy=df[df['emotion']=='Happy']
df_sad=df[df['emotion']=='Sad']
df_angry=df[df['emotion']=='Angry']
df_shame=df[df['emotion']=='Shame']
df_fear=df[df['emotion']=='Fear']
df_surprise=df[df['emotion']=='Surprise']
from wordcloud import WordCloud


happy=""
sad=""
angry=""
shame=""
fear=""
for i in df_happy.hashtags:
    happy+=i.strip()+" "
for i in df_sad.hashtags:
    sad+=i.strip()+" "
for i in df_angry.hashtags:
    angry+=i.strip()+" "
for i in df_shame.hashtags:
    shame+=i.strip()+" "
for i in df_fear.hashtags:
    fear+=i.strip()+" "
#print(s)

happy_split=happy.split()
sad_split=sad.split()
angry_split=angry.split()
shame_split=shame.split()
fear_split=fear.split()
common=set(happy_split) & set(sad_split) & set(angry_split) & set(shame_split) & set(fear_split)

happy_hashtag  = [word for word in happy_split if word not in common]
happy_hashtag = ' '.join(happy_hashtag)
sad_hashtag  = [word for word in sad_split if word not in common]
sad_hashtag = ' '.join(sad_hashtag)
angry_hashtag  = [word for word in angry_split if word not in common]
angry_hashtag = ' '.join(angry_hashtag)
shame_hashtag  = [word for word in shame_split if word not in common]
shame_hashtag = ' '.join(shame_hashtag)
fear_hashtag  = [word for word in fear_split if word not in common]
fear_hashtag = ' '.join(fear_hashtag)
# lower max_font_size, change the maximum number of word and lighten the background:
def PlotWordCloud(words, title):
    wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white' 
                ).generate(words) 
                                                           
    # plot the WordCloud image                        
    plt.figure(figsize = (10, 10), facecolor = None) 
    plt.imshow(wordcloud) 
    plt.axis("off") 
    plt.tight_layout(pad = 0) 
    plt.title(title, fontsize=50)

    plt.show() 
    
    
PlotWordCloud(happy_hashtag, 'Most Happy hashtags')
PlotWordCloud(sad_hashtag, 'Most Sad hashtags')
PlotWordCloud(angry_hashtag, 'Most Angry hashtags')
PlotWordCloud(shame_hashtag, 'Most shame hashtags')
PlotWordCloud(fear_hashtag, 'Most Fear hashtags')

16. Different emotion sentiments most occured words bar plot and tree plot over the world

In [59]:
tweet=df.text
words_in_happy_tweet = [tweet.lower().split() for tweet in df_happy.text]
words_in_sad_tweet = [tweet.lower().split() for tweet in df_sad.text]
words_in_angry_tweet = [tweet.lower().split() for tweet in df_angry.text]
words_in_shame_tweet = [tweet.lower().split() for tweet in df_shame.text]
words_in_fear_tweet = [tweet.lower().split() for tweet in df_fear.text]

happy_words_no_urls = list(itertools.chain(*words_in_happy_tweet))
sad_words_no_urls = list(itertools.chain(*words_in_sad_tweet))
angry_words_no_urls = list(itertools.chain(*words_in_angry_tweet))
shame_words_no_urls = list(itertools.chain(*words_in_shame_tweet))
fear_words_no_urls = list(itertools.chain(*words_in_fear_tweet))

counts_no_happy = collections.Counter(happy_words_no_urls)
counts_no_sad = collections.Counter(sad_words_no_urls)
counts_no_angry = collections.Counter(angry_words_no_urls)
counts_no_shame = collections.Counter(shame_words_no_urls)
counts_no_fear = collections.Counter(fear_words_no_urls)

clean_tweets_no_happy = pd.DataFrame(counts_no_happy.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_sad = pd.DataFrame(counts_no_sad.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_angry = pd.DataFrame(counts_no_angry.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_shame = pd.DataFrame(counts_no_shame.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_fear = pd.DataFrame(counts_no_fear.most_common(15),
                             columns=['words', 'count'])

fig, ax = plt.subplots(figsize=(12, 8))

# Plot horizontal bar graph
clean_tweets_no_happy.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="green", alpha=0.7)
ax.set_title("Common Words Found in Happy Tweets (Including All Words)")
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
clean_tweets_no_sad.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="blue", alpha=0.7)
ax.set_title("Common Words Found in Sad (Including All Words)")
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
clean_tweets_no_angry.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="red", alpha=0.7)
ax.set_title("Common Words Found in Angry Tweets (Including All Words)")
plt.show() 
fig, ax = plt.subplots(figsize=(12, 8))
clean_tweets_no_shame.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="grey", alpha=0.7)
ax.set_title("Common Words Found in Shame Tweets (Including All Words)")
plt.show() 
fig, ax = plt.subplots(figsize=(12, 8))
clean_tweets_no_fear.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="purple", alpha=0.7)
ax.set_title("Common Words Found in Fear Tweets (Including All Words)")
plt.show() 

from collections import Counter
def random_colours(number_of_colors):
    '''
    Simple function for random colours generation.
    Input:
        number_of_colors - integer value indicating the number of colours which are going to be generated.
    Output:
        Color in the following format: ['#E86DA4'] .
    '''
    colors = []
    for i in range(number_of_colors):
        colors.append("#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)]))
    return colors


fig = px.treemap(clean_tweets_no_happy.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Happy emotion tweets')
fig.show()
fig = px.treemap(clean_tweets_no_sad.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Sad emotion tweets')
fig.show()
fig = px.treemap(clean_tweets_no_angry.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Angry emotion tweets')
fig.show()
fig = px.treemap(clean_tweets_no_shame.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Shame emotion tweets')
fig.show()
fig = px.treemap(clean_tweets_no_fear.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Fear emotion tweets')
fig.show()

17. Different emotion sentiments most occured words bar plot and tree plot over the India

In [60]:
tweet=df.text[df['location']=='India']
words_in_happy_tweet = [tweet.lower().split() for tweet in df_happy.text]
words_in_sad_tweet = [tweet.lower().split() for tweet in df_sad.text]
words_in_angry_tweet = [tweet.lower().split() for tweet in df_angry.text]
words_in_shame_tweet = [tweet.lower().split() for tweet in df_shame.text]
words_in_fear_tweet = [tweet.lower().split() for tweet in df_fear.text]

happy_words_no_urls = list(itertools.chain(*words_in_happy_tweet))
sad_words_no_urls = list(itertools.chain(*words_in_sad_tweet))
angry_words_no_urls = list(itertools.chain(*words_in_angry_tweet))
shame_words_no_urls = list(itertools.chain(*words_in_shame_tweet))
fear_words_no_urls = list(itertools.chain(*words_in_fear_tweet))

counts_no_happy = collections.Counter(happy_words_no_urls)
counts_no_sad = collections.Counter(sad_words_no_urls)
counts_no_angry = collections.Counter(angry_words_no_urls)
counts_no_shame = collections.Counter(shame_words_no_urls)
counts_no_fear = collections.Counter(fear_words_no_urls)

clean_tweets_no_happy = pd.DataFrame(counts_no_happy.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_sad = pd.DataFrame(counts_no_sad.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_angry = pd.DataFrame(counts_no_angry.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_shame = pd.DataFrame(counts_no_shame.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_fear = pd.DataFrame(counts_no_fear.most_common(15),
                             columns=['words', 'count'])

fig, ax = plt.subplots(figsize=(12, 8))

# Plot horizontal bar graph
clean_tweets_no_happy.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="green", alpha=0.7)
ax.set_title("Common Words Found in Happy Tweets (Including All Words)")
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
clean_tweets_no_sad.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="blue", alpha=0.7)
ax.set_title("Common Words Found in Sad (Including All Words)")
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
clean_tweets_no_angry.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="red", alpha=0.7)
ax.set_title("Common Words Found in Angry Tweets (Including All Words)")
plt.show() 
fig, ax = plt.subplots(figsize=(12, 8))
clean_tweets_no_shame.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="grey", alpha=0.7)
ax.set_title("Common Words Found in Shame Tweets (Including All Words)")
plt.show() 
fig, ax = plt.subplots(figsize=(12, 8))
clean_tweets_no_fear.sort_values(by='count').plot.barh(x='words', y='count', ax=ax, color="purple", alpha=0.7)
ax.set_title("Common Words Found in Fear Tweets (Including All Words)")
plt.show() 

from collections import Counter
def random_colours(number_of_colors):
    '''
    Simple function for random colours generation.
    Input:
        number_of_colors - integer value indicating the number of colours which are going to be generated.
    Output:
        Color in the following format: ['#E86DA4'] .
    '''
    colors = []
    for i in range(number_of_colors):
        colors.append("#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)]))
    return colors


fig = px.treemap(clean_tweets_no_happy.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Happy emotion tweets')
fig.show()
fig = px.treemap(clean_tweets_no_sad.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Sad emotion tweets')
fig.show()
fig = px.treemap(clean_tweets_no_angry.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Angry emotion tweets')
fig.show()
fig = px.treemap(clean_tweets_no_shame.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Shame emotion tweets')
fig.show()
fig = px.treemap(clean_tweets_no_fear.sort_values(by='count'), path=['words'], values='count',title='Tree of Most Common Words in Fear emotion tweets')
fig.show()

18. DoNut Visualisation of words od different emotions of sentiment over the world

In [61]:
from palettable.colorbrewer.qualitative import Pastel1_7

tweet=df.text
words_in_happy_tweet = [tweet.lower().split() for tweet in df_happy.text]
words_in_sad_tweet = [tweet.lower().split() for tweet in df_sad.text]
words_in_angry_tweet = [tweet.lower().split() for tweet in df_angry.text]
words_in_shame_tweet = [tweet.lower().split() for tweet in df_shame.text]
words_in_fear_tweet = [tweet.lower().split() for tweet in df_fear.text]

happy_words_no_urls = list(itertools.chain(*words_in_happy_tweet))
sad_words_no_urls = list(itertools.chain(*words_in_sad_tweet))
angry_words_no_urls = list(itertools.chain(*words_in_angry_tweet))
shame_words_no_urls = list(itertools.chain(*words_in_shame_tweet))
fear_words_no_urls = list(itertools.chain(*words_in_fear_tweet))

counts_no_happy = collections.Counter(happy_words_no_urls)
counts_no_sad = collections.Counter(sad_words_no_urls)
counts_no_angry = collections.Counter(angry_words_no_urls)
counts_no_shame = collections.Counter(shame_words_no_urls)
counts_no_fear = collections.Counter(fear_words_no_urls)

clean_tweets_no_happy = pd.DataFrame(counts_no_happy.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_sad = pd.DataFrame(counts_no_sad.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_angry = pd.DataFrame(counts_no_angry.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_shame = pd.DataFrame(counts_no_shame.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_fear = pd.DataFrame(counts_no_fear.most_common(15),
                             columns=['words', 'count'])

pt=clean_tweets_no_happy.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Happy Words')
plt.show()

pt=clean_tweets_no_sad.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Sad Words')
plt.show()

pt=clean_tweets_no_angry.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Angry Words')
plt.show()

pt=clean_tweets_no_shame.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Shame Emotion Words')
plt.show()

pt=clean_tweets_no_fear.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Fear Emotion Words')
plt.show()

19. DoNut Visualisation of words od different emotions of sentiment over the India

In [62]:
from palettable.colorbrewer.qualitative import Pastel1_7

tweet=df.text[df.location=='India']
words_in_happy_tweet = [tweet.lower().split() for tweet in df_happy.text]
words_in_sad_tweet = [tweet.lower().split() for tweet in df_sad.text]
words_in_angry_tweet = [tweet.lower().split() for tweet in df_angry.text]
words_in_shame_tweet = [tweet.lower().split() for tweet in df_shame.text]
words_in_fear_tweet = [tweet.lower().split() for tweet in df_fear.text]

happy_words_no_urls = list(itertools.chain(*words_in_happy_tweet))
sad_words_no_urls = list(itertools.chain(*words_in_sad_tweet))
angry_words_no_urls = list(itertools.chain(*words_in_angry_tweet))
shame_words_no_urls = list(itertools.chain(*words_in_shame_tweet))
fear_words_no_urls = list(itertools.chain(*words_in_fear_tweet))

counts_no_happy = collections.Counter(happy_words_no_urls)
counts_no_sad = collections.Counter(sad_words_no_urls)
counts_no_angry = collections.Counter(angry_words_no_urls)
counts_no_shame = collections.Counter(shame_words_no_urls)
counts_no_fear = collections.Counter(fear_words_no_urls)

clean_tweets_no_happy = pd.DataFrame(counts_no_happy.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_sad = pd.DataFrame(counts_no_sad.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_angry = pd.DataFrame(counts_no_angry.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_shame = pd.DataFrame(counts_no_shame.most_common(15),
                             columns=['words', 'count'])
clean_tweets_no_fear = pd.DataFrame(counts_no_fear.most_common(15),
                             columns=['words', 'count'])

pt=clean_tweets_no_happy.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Happy Words')
plt.show()

pt=clean_tweets_no_sad.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Sad Words')
plt.show()

pt=clean_tweets_no_angry.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Angry Words')
plt.show()

pt=clean_tweets_no_shame.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Shame Emotion Words')
plt.show()

pt=clean_tweets_no_fear.sort_values(by='count')
plt.figure(figsize=(16,10))
my_circle=plt.Circle((0,0), 0.7, color='white')
plt.rcParams['text.color'] = 'black'
plt.pie(pt['count'], labels=pt.words, colors=Pastel1_7.hex_colors)
p=plt.gcf()
p.gca().add_artist(my_circle)
plt.title('DoNut Plot Of Unique Fear Emotion Words')
plt.show()

20. Emotion Sentiment change Datewise of worlds tweets

In [63]:
df_happy=df[df['emotion']=='Happy'].groupby(["date"],as_index = False).count()
df_sad=df[df['emotion']=='Sad'].groupby(["date"],as_index = False).count()
df_angry=df[df['emotion']=='Angry'].groupby(["date"],as_index = False).count()
df_shame=df[df['emotion']=='Shame'].groupby(["date"],as_index = False).count()
df_fear=df[df['emotion']=='Fear'].groupby(["date"],as_index = False).count()

plt.subplots(1, figsize=(10, 8))
plt.plot(df_happy["date"],df_happy['text'],color="yellow")
plt.plot(df_sad["date"],df_sad["text"],color="blue")
plt.plot(df_angry["date"],df_angry["text"],color="red")
plt.plot(df_shame["date"],df_shame["text"],color="grey")
plt.plot(df_fear["date"],df_fear["text"],color="purple")
plt.legend(["Happy", "Sad","Angry","Shame","Fear"])
plt.title("Emotion Sentiment Analysis DateWise")
plt.xlabel("Dates")
plt.ylabel("Frequency")
plt.show()

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook



fig, ax = plt.subplots(1, figsize=(10, 8))
ax.scatter(df_happy["date"],df_happy['text'],color="yellow", s=df_happy['text'], alpha=0.5)
ax.scatter(df_sad["date"],df_sad["text"],color="blue", s=df_sad["text"], alpha=0.5)
ax.scatter(df_angry["date"],df_angry["text"],color="red", s=df_angry["text"], alpha=0.5)
ax.scatter(df_shame["date"],df_shame["text"],color="grey", s=df_shame["text"], alpha=0.5)
ax.scatter(df_fear["date"],df_fear["text"],color="purple", s=df_fear["text"], alpha=0.5)

ax.set_xlabel('Weeks', fontsize=15)
ax.set_ylabel('Count', fontsize=15)
ax.set_title('Emotion Sentiment change Datewise')

ax.grid(True)
fig.tight_layout()

plt.show()

21. Emotion Sentiment change Datewise of India tweets

In [64]:
df = df[df['location']=='India']
df_happy=df[df['emotion']=='Happy'].groupby(["date"],as_index = False).count()
df_sad=df[df['emotion']=='Sad'].groupby(["date"],as_index = False).count()
df_angry=df[df['emotion']=='Angry'].groupby(["date"],as_index = False).count()
df_shame=df[df['emotion']=='Shame'].groupby(["date"],as_index = False).count()
df_fear=df[df['emotion']=='Fear'].groupby(["date"],as_index = False).count()

plt.subplots(1, figsize=(10, 8))
plt.plot(df_happy["date"],df_happy['text'],color="yellow")
plt.plot(df_sad["date"],df_sad["text"],color="blue")
plt.plot(df_angry["date"],df_angry["text"],color="red")
plt.plot(df_shame["date"],df_shame["text"],color="grey")
plt.plot(df_fear["date"],df_fear["text"],color="purple")
plt.legend(["Happy", "Sad","Angry","Shame","Fear"])
plt.title("Emotion Sentiment Analysis DateWise")
plt.xlabel("Dates")
plt.ylabel("Frequency")
plt.show()

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook



fig, ax = plt.subplots(1, figsize=(10, 8))
ax.scatter(df_happy["date"],df_happy['text'],color="yellow", s=df_happy['text'], alpha=0.5)
ax.scatter(df_sad["date"],df_sad["text"],color="blue", s=df_sad["text"], alpha=0.5)
ax.scatter(df_angry["date"],df_angry["text"],color="red", s=df_angry["text"], alpha=0.5)
ax.scatter(df_shame["date"],df_shame["text"],color="grey", s=df_shame["text"], alpha=0.5)
ax.scatter(df_fear["date"],df_fear["text"],color="purple", s=df_fear["text"], alpha=0.5)

ax.set_xlabel('Weeks', fontsize=15)
ax.set_ylabel('Count', fontsize=15)
ax.set_title('Sentiment change weekwise')

ax.grid(True)
fig.tight_layout()

plt.show()

22. Per day Tweets counts of different emotions over the worlds tweets

In [65]:
all_tweet_location

df_happy=all_tweet_location[all_tweet_location['emotion']=='Happy'].groupby(["date"],as_index = False).count()
df_sad=all_tweet_location[all_tweet_location['emotion']=='Sad'].groupby(["date"],as_index = False).count()
df_angry=all_tweet_location[all_tweet_location['emotion']=='Angry'].groupby(["date"],as_index = False).count()
df_shame=all_tweet_location[all_tweet_location['emotion']=='Shame'].groupby(["date"],as_index = False).count()
df_fear=all_tweet_location[all_tweet_location['emotion']=='Fear'].groupby(["date"],as_index = False).count()

fig = go.Figure()
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_happy['count'],
                    mode='lines+markers',
                    name='Sad'))
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_angry['count'],
                    mode='lines+markers',
                    name='Angry'))
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_happy['count'],
                    mode='lines+markers',
                    name='Happy'))
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_shame['count'],
                    mode='lines+markers',
                    name='Shame'))
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_fear['count'],
                    mode='lines+markers',
                    name='Fear'))

fig.update_layout(
    title_text='Different Emotion Tweets per Day Worldwide : ({} - {})'.format(df_happy['date'].min(),
                                                       df_happy['date'].max()),template="plotly_dark",
    title_x=0.5)

fig.show()

23. Per day Tweets counts of different emotions over the India tweets

In [66]:
all_tweet_location=all_tweet_location[all_tweet_location['location']=='India']

df_happy=all_tweet_location[all_tweet_location['emotion']=='Happy'].groupby(["date"],as_index = False).count()
df_sad=all_tweet_location[all_tweet_location['emotion']=='Sad'].groupby(["date"],as_index = False).count()
df_angry=all_tweet_location[all_tweet_location['emotion']=='Angry'].groupby(["date"],as_index = False).count()
df_shame=all_tweet_location[all_tweet_location['emotion']=='Shame'].groupby(["date"],as_index = False).count()
df_fear=all_tweet_location[all_tweet_location['emotion']=='Fear'].groupby(["date"],as_index = False).count()

fig = go.Figure()
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_happy['count'],
                    mode='lines+markers',
                    name='Sad'))
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_angry['count'],
                    mode='lines+markers',
                    name='Angry'))
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_happy['count'],
                    mode='lines+markers',
                    name='Happy'))
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_shame['count'],
                    mode='lines+markers',
                    name='Shame'))
fig.add_trace(go.Scatter(x=df_happy['date'],
                                y=df_fear['count'],
                    mode='lines+markers',
                    name='Fear'))

fig.update_layout(
    title_text='Different Emotion Tweets per Day Wordlwide : ({} - {})'.format(df_happy['date'].min(),
                                                       df_happy['date'].max()),template="plotly_dark",
    title_x=0.5)

fig.show()

4. Conclusion

Sentiment analysis of the tweets determine the emotions of various human behaviors and inclination of vast population towards specific topic, item or entity. Sentiment Analysis or Opinion Mining is the computational treatment of opinions, sentiments and subjectivity of text. Sentiment analysis, a branch of digital analytics aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document.

We have used the lstm for the sentimental analysis and for prediction of the sentiments of the tweets. We have achieved the accuracy of 94%. After doing analysis we have found that approx. 31.9% are in fear, 34.6 % people are happy , 17% are sad and 12.9% are angry and 3.68 % are shame worldwide and 28.5% in fear, 39% are happy, 17.4% are sad and11.5% are Angry and 3.6% are shame in india. So, we can thus conclude that the people are doing negative tweets due to covid in India. and peoples of india as well as world are facing mental issue or they are thinking negatively and posting negative tweets over the tweeter.

In [ ]: